Skeleton Code

The code below provides a skeleton for the model building & training component of your project. You can add/remove/build on code however you see fit, this is meant as a starting point.

This notebook was initialized and executed locally on a workstation equipped with an NVIDIA RTX 3080 Founders Edition GPU.

In [1]:
import numpy as np # Linear algebra.
import pandas as pd # Data processing, CSV file I/O (e.g. pd.read_csv).
import os
from glob import glob

%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.image as mpimg # Read png files.

# Set the plotting style of figures.
plt.style.use('ggplot')

# Plotting pretty figures and avoid blurry images.
%config InlineBackend.figure_format='retina'
In [2]:
## Import any other stats/DL/ML packages you may need here. E.g. Keras, scikit-learn, etc.
import math
import random
from random import sample
import csv

import sklearn.model_selection as skl
from sklearn.metrics import roc_curve, auc, precision_recall_curve, confusion_matrix
from sklearn.metrics import plot_precision_recall_curve, average_precision_score, f1_score
from sklearn.preprocessing import binarize

import scipy
from scipy.ndimage import gaussian_filter

import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator, load_img
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.applications.vgg16 import VGG16
# from tensorflow.keras.applications.vgg19 import VGG19
# from tensorflow.keras.applications.resnet import ResNet50
from tensorflow.keras.optimizers import Adam, RMSprop
from tensorflow.keras.callbacks import ModelCheckpoint, LearningRateScheduler, EarlyStopping, ReduceLROnPlateau

from tensorflow import keras
from keras.layers import GlobalAveragePooling2D, Dense, Dropout, Flatten, Conv2D, MaxPool2D, MaxPooling2D, BatchNormalization
from keras.utils.vis_utils import plot_model
In [3]:
RND_STATE = 42 # Random state used for reproducibility.

Notebook environment and GPU support:

In [4]:
!nvidia-smi
Sun Nov 21 19:35:07 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 496.76       Driver Version: 496.76       CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ... WDDM  | 00000000:59:00.0  On |                  N/A |
| 40%   31C    P8    27W / 320W |   1347MiB / 10240MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      3576    C+G   ...8wekyb3d8bbwe\Cortana.exe    N/A      |
|    0   N/A  N/A      3680    C+G   ...artMenuExperienceHost.exe    N/A      |
|    0   N/A  N/A      5476    C+G   ...054.29\msedgewebview2.exe    N/A      |
|    0   N/A  N/A      8684    C+G   ...us Core\Glorious Core.exe    N/A      |
|    0   N/A  N/A     16568    C+G   ...y\ShellExperienceHost.exe    N/A      |
|    0   N/A  N/A     16920    C+G   ...054.29\msedgewebview2.exe    N/A      |
|    0   N/A  N/A     17540    C+G   ...perience\NVIDIA Share.exe    N/A      |
|    0   N/A  N/A     17692    C+G   ...bbwe\Microsoft.Photos.exe    N/A      |
|    0   N/A  N/A     19904    C+G   Insufficient Permissions        N/A      |
|    0   N/A  N/A     22868    C+G   ...perience\NVIDIA Share.exe    N/A      |
|    0   N/A  N/A     25528    C+G   ...lPanel\SystemSettings.exe    N/A      |
|    0   N/A  N/A     27988    C+G   ...lack\app-4.22.0\slack.exe    N/A      |
|    0   N/A  N/A     30396    C+G   ...arp.BrowserSubprocess.exe    N/A      |
|    0   N/A  N/A     31348    C+G   ...ekyb3d8bbwe\YourPhone.exe    N/A      |
|    0   N/A  N/A     31580    C+G   ...n1h2txyewy\SearchHost.exe    N/A      |
|    0   N/A  N/A     33084    C+G   ... Host\Razer Synapse 3.exe    N/A      |
|    0   N/A  N/A     35672    C+G   ...batNotificationClient.exe    N/A      |
|    0   N/A  N/A     36520    C+G   ...ser\Application\brave.exe    N/A      |
|    0   N/A  N/A     37924    C+G   C:\Windows\explorer.exe         N/A      |
|    0   N/A  N/A     42064    C+G   ...root\Office16\OUTLOOK.EXE    N/A      |
|    0   N/A  N/A     44076    C+G   Insufficient Permissions        N/A      |
|    0   N/A  N/A     46860    C+G   ...root\Office16\WINWORD.EXE    N/A      |
|    0   N/A  N/A     52308    C+G   ...3d8bbwe\CalculatorApp.exe    N/A      |
+-----------------------------------------------------------------------------+
In [5]:
import subprocess

str(subprocess.check_output(["nvidia-smi", "-L"])).count('UUID')
Out[5]:
1
In [6]:
from __future__ import absolute_import, division, print_function, unicode_literals

executing_eagerly = tf.executing_eagerly()

print("TensorFlow version: ", tf.__version__)
print("Eager mode: ", executing_eagerly)
TensorFlow version:  2.7.0
Eager mode:  True
In [7]:
from tensorflow.python.client import device_lib

def get_available_gpus():
    
    local_device_protos = device_lib.list_local_devices()
    
    return [x.name for x in local_device_protos if x.device_type == 'GPU']


print("Are GPUs available? ", tf.config.experimental.list_physical_devices("GPU")) 
print("Num of GPUs available: ", len(tf.config.list_physical_devices('GPU')))
get_available_gpus()
Are GPUs available?  [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Num of GPUs available:  1
Out[7]:
['/device:GPU:0']
In [8]:
import warnings
warnings.filterwarnings("ignore")

Do some early processing of your metadata for easier model training:

In [9]:
data_dir = './data' # Path to data directory.
train_dir = data_dir + './train' # Path to train directory.
test_dir = data_dir + './test' # Path to test directory.
In [10]:
## Display a sample chest x-ray image.
plt.figure(figsize=(16,16))
plt.imshow(load_img('./data/images_001/images/00000057_003.png'))
plt.show()
In [11]:
## Below is some helper code to read all of your full image filepaths into a dataframe for easier manipulation.
## Load the NIH data to all_xray_df.
all_xray_df = pd.read_csv('./data/Data_Entry_2017.csv')
all_image_paths = {os.path.basename(x): x for x in 
                   glob(os.path.join('./data', 'images*', '*', '*.png'))}
print('Scans found:', len(all_image_paths), ', Total Headers', all_xray_df.shape[0])
all_xray_df['path'] = all_xray_df['Image Index'].map(all_image_paths.get)
all_xray_df.sample(3, random_state=RND_STATE)
Scans found: 112120 , Total Headers 112120
Out[11]:
Image Index Finding Labels Follow-up # Patient ID Patient Age Patient Gender View Position OriginalImage[Width Height] OriginalImagePixelSpacing[x y] Unnamed: 11 path
89645 00022260_003.png No Finding 3 22260 54 F PA 2542 2434 0.143 0.143 NaN ./data\images_010\images\00022260_003.png
47446 00012048_007.png Infiltration|Mass|Nodule 7 12048 65 M AP 2500 2048 0.168 0.168 NaN ./data\images_006\images\00012048_007.png
57963 00014352_005.png No Finding 5 14352 6 M PA 2992 2033 0.143 0.143 NaN ./data\images_007\images\00014352_005.png
In [12]:
## Here you may want to create some extra columns in your table with binary indicators of certain diseases 
## rather than working directly with the 'Finding Labels' column.

# Todo
findings = set()
for f in all_xray_df['Finding Labels'].unique():
    findings.update(f.split('|'))
print(f'Total number of single diagnoses in the data: {len(findings)}')
Total number of single diagnoses in the data: 15
In [13]:
for label in findings:
    all_xray_df[label] = all_xray_df['Finding Labels'].map(lambda finding: 1. if label in finding else 0)
    print(f'%s: %d'%(label, int(all_xray_df[label].sum())))
Pneumothorax: 5302
Fibrosis: 1686
Cardiomegaly: 2776
Pleural_Thickening: 3385
Edema: 2303
Pneumonia: 1431
Infiltration: 19894
Consolidation: 4667
Nodule: 6331
Effusion: 13317
Hernia: 227
Emphysema: 2516
Mass: 5782
No Finding: 60361
Atelectasis: 11559
In [14]:
# for finding in findings:
#     all_xray_df[finding] = all_xray_df['Finding Labels'].map(lambda x: 1. if finding in x else 0)

all_xray_df.head()
Out[14]:
Image Index Finding Labels Follow-up # Patient ID Patient Age Patient Gender View Position OriginalImage[Width Height] OriginalImagePixelSpacing[x ... Pneumonia Infiltration Consolidation Nodule Effusion Hernia Emphysema Mass No Finding Atelectasis
0 00000001_000.png Cardiomegaly 0 1 58 M PA 2682 2749 0.143 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 00000001_001.png Cardiomegaly|Emphysema 1 1 58 M PA 2894 2729 0.143 ... 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
2 00000001_002.png Cardiomegaly|Effusion 2 1 58 M PA 2500 2048 0.168 ... 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0
3 00000002_000.png No Finding 0 2 81 M PA 2500 2048 0.171 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
4 00000003_000.png Hernia 0 3 81 F PA 2582 2991 0.143 ... 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0

5 rows × 28 columns

In [15]:
## Here we can create a new column called 'pneumonia_class' that will allow us to look at 
## images with or without pneumonia for binary classification.

# Todo
all_xray_df['pneumonia_class'] = all_xray_df.apply(
    lambda x: 'Positive' if x['Pneumonia']==1. else 'Negative', axis=1)

all_xray_df.head()
Out[15]:
Image Index Finding Labels Follow-up # Patient ID Patient Age Patient Gender View Position OriginalImage[Width Height] OriginalImagePixelSpacing[x ... Infiltration Consolidation Nodule Effusion Hernia Emphysema Mass No Finding Atelectasis pneumonia_class
0 00000001_000.png Cardiomegaly 0 1 58 M PA 2682 2749 0.143 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Negative
1 00000001_001.png Cardiomegaly|Emphysema 1 1 58 M PA 2894 2729 0.143 ... 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 Negative
2 00000001_002.png Cardiomegaly|Effusion 2 1 58 M PA 2500 2048 0.168 ... 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 Negative
3 00000002_000.png No Finding 0 2 81 M PA 2500 2048 0.171 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 Negative
4 00000003_000.png Hernia 0 3 81 F PA 2582 2991 0.143 ... 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 Negative

5 rows × 29 columns

In [16]:
all_xray_df[all_xray_df['pneumonia_class']=='Positive']
Out[16]:
Image Index Finding Labels Follow-up # Patient ID Patient Age Patient Gender View Position OriginalImage[Width Height] OriginalImagePixelSpacing[x ... Infiltration Consolidation Nodule Effusion Hernia Emphysema Mass No Finding Atelectasis pneumonia_class
48 00000013_010.png Effusion|Pneumonia|Pneumothorax 10 13 60 M AP 3056 2544 0.139 ... 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 Positive
126 00000032_012.png Atelectasis|Consolidation|Edema|Pneumonia 12 32 55 F AP 2500 2048 0.168 ... 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 Positive
253 00000056_000.png Nodule|Pneumonia 0 56 76 M PA 2500 2048 0.168 ... 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 Positive
276 00000061_012.png Edema|Effusion|Infiltration|Pleural_Thickening... 12 61 77 M AP 3056 2544 0.139 ... 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 Positive
279 00000061_015.png Pneumonia 15 61 77 M AP 3056 2544 0.139 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Positive
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
111557 00030536_007.png Atelectasis|Consolidation|Pneumonia 7 30536 56 F AP 3056 2544 0.139 ... 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 Positive
111627 00030570_001.png Edema|Infiltration|Pneumonia 1 30570 29 F AP 3056 2544 0.139 ... 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Positive
111767 00030621_002.png Pneumonia 2 30621 22 F AP 3056 2544 0.139 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Positive
111845 00030637_016.png Consolidation|Pneumonia 16 30637 48 M AP 3056 2544 0.139 ... 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Positive
112115 00030801_001.png Mass|Pneumonia 1 30801 39 M PA 2048 2500 0.168 ... 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 Positive

1431 rows × 29 columns

In [17]:
all_xray_df[all_xray_df['pneumonia_class']=='Positive'].shape
Out[17]:
(1431, 29)
In [18]:
all_xray_df[all_xray_df['pneumonia_class']=='Negative']
Out[18]:
Image Index Finding Labels Follow-up # Patient ID Patient Age Patient Gender View Position OriginalImage[Width Height] OriginalImagePixelSpacing[x ... Infiltration Consolidation Nodule Effusion Hernia Emphysema Mass No Finding Atelectasis pneumonia_class
0 00000001_000.png Cardiomegaly 0 1 58 M PA 2682 2749 0.143 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Negative
1 00000001_001.png Cardiomegaly|Emphysema 1 1 58 M PA 2894 2729 0.143 ... 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 Negative
2 00000001_002.png Cardiomegaly|Effusion 2 1 58 M PA 2500 2048 0.168 ... 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 Negative
3 00000002_000.png No Finding 0 2 81 M PA 2500 2048 0.171 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 Negative
4 00000003_000.png Hernia 0 3 81 F PA 2582 2991 0.143 ... 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 Negative
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
112114 00030801_000.png No Finding 0 30801 39 M PA 2500 2048 0.168 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 Negative
112116 00030802_000.png No Finding 0 30802 29 M PA 2048 2500 0.168 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 Negative
112117 00030803_000.png No Finding 0 30803 42 F PA 2048 2500 0.168 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 Negative
112118 00030804_000.png No Finding 0 30804 30 F PA 2048 2500 0.168 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 Negative
112119 00030805_000.png No Finding 0 30805 27 M PA 2048 2500 0.171 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 Negative

110689 rows × 29 columns

In [19]:
all_xray_df[all_xray_df['pneumonia_class']=='Negative'].shape
Out[19]:
(110689, 29)

Create your training and testing data:

In [20]:
def create_splits(**vargs):
    
    ## Either build your own or use a built-in library to split your original dataframe into two sets 
    ## that can be used for training and testing your model.
    ## It's important to consider here how balanced or imbalanced you want each of those sets to be
    ## for the presence of pneumonia.
    
    # Todo
    df = vargs['df']
    
    train_data, val_data = skl.train_test_split(df, test_size = .2, stratify=df['pneumonia_class'])
    train_data_ratio = len(train_data[train_data['pneumonia_class']=='Positive'])/len(train_data)
    val_data_ratio = len(val_data[val_data['pneumonia_class']=='Positive'])/len(val_data)
    print(f'Training set pneumonia: {100.*train_data_ratio: .2f}%\nValidation set pneumonia: {100.*val_data_ratio: .2f}%')
    
    # Make training set contain same number of positive and negative cases (balanced).
    train_pos_inds = train_data[train_data['pneumonia_class']=='Positive'].index.tolist()
    train_neg_inds = train_data[train_data['pneumonia_class']=='Negative'].index.tolist()

    # Randomly sample the train_neg_sample list to be of the same length as train_pos_inds.
    # Use the seed function for reproducibility.
    random.seed(RND_STATE)

    train_neg_sample = sample(train_neg_inds, len(train_pos_inds))
    train_data = train_data.loc[train_pos_inds+train_neg_sample]

    train_data_ratio = len(train_data[train_data['pneumonia_class']=='Positive'])/len(train_data)
    print(f'Training set corrected (50/50), Pneumonia: {100.*train_data_ratio: .2f}%')
    
    # Make validation set contain 80% positive and 20% negative cases.
    val_pos_inds = val_data[val_data['pneumonia_class']=='Positive'].index.tolist()
    val_neg_inds = val_data[val_data['pneumonia_class']=='Negative'].index.tolist()

    val_neg_sample = sample(val_neg_inds, 4*len(val_pos_inds))
    val_data = val_data.loc[val_pos_inds+val_neg_sample]

    val_data_ratio = len(val_data[val_data['pneumonia_class']=='Positive'])/len(val_data)
    print(f'Validation set corrected (20/80), Pneumonia: {100.*val_data_ratio :.2f}%')
    
    return train_data, val_data
In [21]:
train_data, val_data = create_splits(df=all_xray_df)

print(f'Training set size: {len(train_data)}\nValidation set size: {len(val_data)}')
Training set pneumonia:  1.28%
Validation set pneumonia:  1.28%
Training set corrected (50/50), Pneumonia:  50.00%
Validation set corrected (20/80), Pneumonia: 20.00%
Training set size: 2290
Validation set size: 1430
In [22]:
(train_data['pneumonia_class']=='Positive').value_counts()
Out[22]:
True     1145
False    1145
Name: pneumonia_class, dtype: int64
In [23]:
(val_data['pneumonia_class']=='Negative').value_counts()
Out[23]:
True     1144
False     286
Name: pneumonia_class, dtype: int64

We concur that both train_data and val_data have the correct number of positives/negative Pneumonia cases, in each set.

In [24]:
val_data['Patient Gender'].value_counts()
Out[24]:
M    846
F    584
Name: Patient Gender, dtype: int64

Checking the train_data distribution for changes in the Age distribution of Male patients diagnosed with Pneumonia:

In [25]:
scipy.stats.ttest_ind(
    all_xray_df['Patient Age'][(all_xray_df['Pneumonia']==True) & (all_xray_df['Patient Gender']=='M')],
    train_data['Patient Age'][(train_data['Pneumonia']==True) & (train_data['Patient Gender']=='M')])
Out[25]:
Ttest_indResult(statistic=-0.11124603193839631, pvalue=0.9114361230426931)

Checking the train_data distribution for changes in the Age distribution of Female patients diagnosed with Pneumonia:

In [26]:
scipy.stats.ttest_ind(
    all_xray_df['Patient Age'][(all_xray_df['Pneumonia']==True) & (all_xray_df['Patient Gender']=='F')],
    train_data['Patient Age'][(train_data['Pneumonia']==True) & (train_data['Patient Gender']=='F')])
Out[26]:
Ttest_indResult(statistic=-0.03191547918568419, pvalue=0.974545400833041)
In [27]:
train_data['Patient Gender'].value_counts()
Out[27]:
M    1328
F     962
Name: Patient Gender, dtype: int64

We concur that the training data, after sampling, revealed 57.9% male patients and 42.1% female patients.

Checking the val_data distribution for changes in the Age distribution of Male patients diagnosed with Pneumonia:

In [28]:
scipy.stats.ttest_ind(
    all_xray_df['Patient Age'][(all_xray_df['Pneumonia']==True) & (all_xray_df['Patient Gender']=='M')],
    val_data['Patient Age'][(val_data['Pneumonia']==True) & (val_data['Patient Gender']=='M')])
Out[28]:
Ttest_indResult(statistic=0.2808128089163631, pvalue=0.7789115476875096)

Checking the val_data distribution for changes in the Age distribution of Female patients diagnosed with Pneumonia:

In [29]:
scipy.stats.ttest_ind(
    all_xray_df['Patient Age'][(all_xray_df['Pneumonia']== True) & (all_xray_df['Patient Gender']=='F')],
    val_data['Patient Age'][(val_data['Pneumonia']==True) & (val_data['Patient Gender']=='F')])
Out[29]:
Ttest_indResult(statistic=0.08069448033475095, pvalue=0.9357078604858081)
In [30]:
val_data['Patient Gender'].value_counts()
Out[30]:
M    846
F    584
Name: Patient Gender, dtype: int64

We concur that the validation data, after sampling, revealed 56.64% male patients and 44.41% female patients.

Note that the TTests depicted above revealed that both Age and Gender data distributions, in both train_data and val_data, reflect the general data's demographic of distributions.

In [31]:
train_data['View Position'].value_counts()
Out[31]:
PA    1208
AP    1082
Name: View Position, dtype: int64

We concur that the training data, after sampling, revealed 52.4% PAand 47.6% AP viewing positions.

In [32]:
val_data['View Position'].value_counts()
Out[32]:
PA    788
AP    642
Name: View Position, dtype: int64

We concur that the validation data, after sampling, revealed 55.66% PAand 44.34% AP viewing positions.

Now we can begin our model-building & training

First suggestion: perform some image augmentation on your data

In [33]:
## Hyperparameter setup.
IMG_SIZE = (224, 224) # Used for data augmentation.
TARGET_SIZE = (512, 512) # Used for data augmentation.
LR_RATE = 1e-5 # Learning rate.
LOSS = 'binary_crossentropy' # Loss function.
METRICS = [ # Accuracy metrics.
    'binary_accuracy']
NUM_EPOCHS = 15 # Number of epochs.

## When it comes to "Out Of Memory" (OOM) errors when training,
# the most straightforward thing to do is to reduce the BATCH_SIZE hyperparameter.
BATCH_SIZE = 8 # Batch size.
In [34]:
def my_image_augmentation(**vargs):
    
    ## Recommendation here to implement a package like Keras' ImageDataGenerator
    ## with some of the built-in augmentations.
    
    ## Keep an eye out for types of augmentation that are or are not appropriate for medical imaging data.
    ## Also keep in mind what sort of augmentation is or is not appropriate for testing vs validation data.
    
    ## STAND-OUT SUGGESTION: implement some of your own custom augmentation that's *not*
    ## built into something like a Keras package.
    
    # Args.
    rotation = vargs['rotation'] if 'rotation' in vargs else 20
    shear = vargs['shear'] if 'shear' in vargs else .15
    zoom = vargs['zoom'] if 'rotation' in vargs else .12
    train = vargs['train']
    
    if train == True:
        my_idg = ImageDataGenerator(
            rescale=1./255., 
            horizontal_flip=True, 
            vertical_flip=False, 
            height_shift_range=.1, 
            width_shift_range=.1, 
            rotation_range=rotation, 
            shear_range=shear,
            zoom_range=zoom)
    else:
        my_idg = ImageDataGenerator(rescale=1./255.)
    
    return my_idg


## Create the actual generators using the output of my_image_augmentation for your training data.
def make_train_gen(**vargs):
        
    # Args.
    df = vargs['df']
    
    aug = my_image_augmentation(**vargs)
    
    # Suggestion here to use the flow_from_dataframe library:
    train_gen = aug.flow_from_dataframe(
        dataframe=df, 
        directory=None, 
        x_col='path',
        y_col='pneumonia_class',
        class_mode='binary',
        target_size=IMG_SIZE, 
        batch_size=BATCH_SIZE,
        seed=RND_STATE)

    return train_gen


def make_val_gen(**vargs):
        
    # Args.
    df = vargs['df']
    
    aug = my_image_augmentation(**vargs)
        
    val_gen = aug.flow_from_dataframe(
        dataframe=df, 
        directory=None, 
        x_col='path',
        y_col='pneumonia_class',
        class_mode='binary',
        target_size=IMG_SIZE, 
        batch_size=len(val_data),
        seed=RND_STATE)

    return val_gen
In [35]:
train_gen = make_train_gen(df=train_data, train=True)
val_gen = make_val_gen(df=val_data, train=False)
Found 2290 validated image filenames belonging to 2 classes.
Found 1430 validated image filenames belonging to 2 classes.
In [36]:
train_gen.class_indices
Out[36]:
{'Negative': 0, 'Positive': 1}
In [37]:
## May want to pull a single large batch of random validation data for testing after each epoch:
valX, valY = val_gen.next()
In [38]:
## May want to look at some examples of our augmented training data. 
## This is helpful for understanding the extent to which data is being manipulated prior to training, 
## and can be compared with how the raw data look prior to augmentation.

t_x, t_y = next(train_gen)
fig, m_axs = plt.subplots(2, 4, figsize=(14,8))
for (c_x, c_y, c_ax) in zip(t_x, t_y, m_axs.flatten()):
    c_ax.imshow(c_x[:,:,0], cmap='bone')
    if c_y == 1: 
        c_ax.set_title('Pneumonia')
    else:
        c_ax.set_title('No Pneumonia')
    c_ax.axis('off')
    c_ax.set_visible(True)
fig.tight_layout()    
fig.savefig('./out/examples_of_augmented_training_data.png', dpi=300)
plt.show()
plt.draw()
<Figure size 432x288 with 0 Axes>
In [39]:
fig, m_axs = plt.subplots(2, 4, figsize=(14,6))
for (c_x, c_y, c_ax) in zip(t_x, t_y, m_axs.flatten()):
    c_ax.hist(c_x[:,:,0].flatten(), bins=256)
    if c_y == 1: 
        c_ax.set_title('Pneumonia')
    else:
        c_ax.set_title('No Pneumonia')
    c_ax.set_visible(True)
fig.tight_layout()
fig.savefig('./out/examples_of_augmented_training_data_hist.png', dpi=300)
plt.show()
plt.draw()
<Figure size 432x288 with 0 Axes>

Build your model:

Recommendation here to use a pre-trained network downloaded from Keras for fine-tuning.

In [40]:
## Loads the pre-trained model.
def load_pretrained(**vargs):
    
    # Import the pre-trained model.
    pretrained_model = VGG16(include_top=True, weights='imagenet') # input_shape=(512,512,3)
    transfer_layer = pretrained_model.get_layer('block5_pool')
    model = Model(inputs=pretrained_model.input, outputs=transfer_layer.output)
    
    # Trainable?
#     for layer in model.layers[0:17]:
#         layer.trainable = True # False
    
    print('{} Layers:'.format(pretrained_model.name.upper()))
    for layer in model.layers:
        print(f'%s: Trainable? %s' % (layer.name, layer.trainable))
    
    return model
In [41]:
def build(**vargs):
        
    # Loads the pre-trained model.
    pretrained_model = load_pretrained()
    
    # Args.
    end_layer_freeze_at = vargs['end_layer_freeze_at']
    specific_layer = vargs['specific_layer']
    fc_list = vargs['fc_list']
    model_name = vargs['name']
    
    if specific_layer == None:        
        for layer in pretrained_model.layers[0:end_layer_freeze_at]:
            layer.trainable = False
            
    else: # If specific_layer != False.
        for ind, layer in enumerate(pretrained_model.layers):
            if ind != specific_layer: 
                layer.trainable = False
                
        for layer in pretrained_model.layers:
            print(layer.name, layer.trainable)

    # my_model = Sequential()
    # ....add your pre-trained model, and then whatever additional layers you think you might
    # want for fine-tuning (Flatteen, Dense, Dropout, etc.).
    
    # if you want to compile your model within this function, consider which layers of your pre-trained model, 
    # you want to freeze before you compile.
    
    # Also make sure you set your optimizer, loss function, and metrics to monitor.
    model = Sequential(name=model_name)
    model.add(pretrained_model)

    # Flatten the output of the model because it is from a convolutional layer.
    model.add(Flatten())

    if len(fc_list) > 0:
        for i, num in enumerate(fc_list):    
            model.add(Dropout(.5))
            model.add(Dense(num, activation='relu'))
    
    # Dence layer.
    model.add(Dense(1, activation='sigmoid'))
    
    # Model optimization: define the learning rate, loss, and metrics.
    optimizer = Adam(lr=LR_RATE)
    
    # Compile the model.
    model.compile(optimizer=optimizer, loss=LOSS, metrics=METRICS)
    
    return model


## STAND-OUT Suggestion: choose another output layer besides just the last classification layer of your modele
## to output class activation maps to aid in clinical interpretation of your model's results.
In [42]:
pretrained_model = load_pretrained()
VGG16 Layers:
input_1: Trainable? True
block1_conv1: Trainable? True
block1_conv2: Trainable? True
block1_pool: Trainable? True
block2_conv1: Trainable? True
block2_conv2: Trainable? True
block2_pool: Trainable? True
block3_conv1: Trainable? True
block3_conv2: Trainable? True
block3_conv3: Trainable? True
block3_pool: Trainable? True
block4_conv1: Trainable? True
block4_conv2: Trainable? True
block4_conv3: Trainable? True
block4_pool: Trainable? True
block5_conv1: Trainable? True
block5_conv2: Trainable? True
block5_conv3: Trainable? True
block5_pool: Trainable? True
In [43]:
pretrained_model.summary()
Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_1 (InputLayer)        [(None, 224, 224, 3)]     0         
                                                                 
 block1_conv1 (Conv2D)       (None, 224, 224, 64)      1792      
                                                                 
 block1_conv2 (Conv2D)       (None, 224, 224, 64)      36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, 112, 112, 64)      0         
                                                                 
 block2_conv1 (Conv2D)       (None, 112, 112, 128)     73856     
                                                                 
 block2_conv2 (Conv2D)       (None, 112, 112, 128)     147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, 56, 56, 128)       0         
                                                                 
 block3_conv1 (Conv2D)       (None, 56, 56, 256)       295168    
                                                                 
 block3_conv2 (Conv2D)       (None, 56, 56, 256)       590080    
                                                                 
 block3_conv3 (Conv2D)       (None, 56, 56, 256)       590080    
                                                                 
 block3_pool (MaxPooling2D)  (None, 28, 28, 256)       0         
                                                                 
 block4_conv1 (Conv2D)       (None, 28, 28, 512)       1180160   
                                                                 
 block4_conv2 (Conv2D)       (None, 28, 28, 512)       2359808   
                                                                 
 block4_conv3 (Conv2D)       (None, 28, 28, 512)       2359808   
                                                                 
 block4_pool (MaxPooling2D)  (None, 14, 14, 512)       0         
                                                                 
 block5_conv1 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_conv2 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_conv3 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_pool (MaxPooling2D)  (None, 7, 7, 512)         0         
                                                                 
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________
In [44]:
## Get the names of layers of the CNN model:
layer_names = [layer.name for layer in pretrained_model.layers]
layer_names
Out[44]:
['input_1',
 'block1_conv1',
 'block1_conv2',
 'block1_pool',
 'block2_conv1',
 'block2_conv2',
 'block2_pool',
 'block3_conv1',
 'block3_conv2',
 'block3_conv3',
 'block3_pool',
 'block4_conv1',
 'block4_conv2',
 'block4_conv3',
 'block4_pool',
 'block5_conv1',
 'block5_conv2',
 'block5_conv3',
 'block5_pool']
In [45]:
## Get the output of the layers of the CNN model:
layer_outputs = [layer.output for layer in pretrained_model.layers]
layer_outputs
Out[45]:
[<KerasTensor: shape=(None, 224, 224, 3) dtype=float32 (created by layer 'input_1')>,
 <KerasTensor: shape=(None, 224, 224, 64) dtype=float32 (created by layer 'block1_conv1')>,
 <KerasTensor: shape=(None, 224, 224, 64) dtype=float32 (created by layer 'block1_conv2')>,
 <KerasTensor: shape=(None, 112, 112, 64) dtype=float32 (created by layer 'block1_pool')>,
 <KerasTensor: shape=(None, 112, 112, 128) dtype=float32 (created by layer 'block2_conv1')>,
 <KerasTensor: shape=(None, 112, 112, 128) dtype=float32 (created by layer 'block2_conv2')>,
 <KerasTensor: shape=(None, 56, 56, 128) dtype=float32 (created by layer 'block2_pool')>,
 <KerasTensor: shape=(None, 56, 56, 256) dtype=float32 (created by layer 'block3_conv1')>,
 <KerasTensor: shape=(None, 56, 56, 256) dtype=float32 (created by layer 'block3_conv2')>,
 <KerasTensor: shape=(None, 56, 56, 256) dtype=float32 (created by layer 'block3_conv3')>,
 <KerasTensor: shape=(None, 28, 28, 256) dtype=float32 (created by layer 'block3_pool')>,
 <KerasTensor: shape=(None, 28, 28, 512) dtype=float32 (created by layer 'block4_conv1')>,
 <KerasTensor: shape=(None, 28, 28, 512) dtype=float32 (created by layer 'block4_conv2')>,
 <KerasTensor: shape=(None, 28, 28, 512) dtype=float32 (created by layer 'block4_conv3')>,
 <KerasTensor: shape=(None, 14, 14, 512) dtype=float32 (created by layer 'block4_pool')>,
 <KerasTensor: shape=(None, 14, 14, 512) dtype=float32 (created by layer 'block5_conv1')>,
 <KerasTensor: shape=(None, 14, 14, 512) dtype=float32 (created by layer 'block5_conv2')>,
 <KerasTensor: shape=(None, 14, 14, 512) dtype=float32 (created by layer 'block5_conv3')>,
 <KerasTensor: shape=(None, 7, 7, 512) dtype=float32 (created by layer 'block5_pool')>]
In [46]:
## Generate feature maps from the CNN model and log them:
# feature_maps = pretrained_model.predict(pretrained_model.input)

# for layer_name, feature_map in zip(
#     layer_names,
#     feature_maps):print(f"The shape of the {layer_name} is =======>> {feature_map.shape}")

Model 1 (VGG16_v1)

About

  • The architecture of Model 1 uses the last output layer from the downloaded VGG16 pretrained model. No dropout layers were added to the model. Also, no FC-layers were added to the model.
  • Model 1 is assumed to run for 15 epochs to observe the learning outcomes.
  • Due to a VRAM limitation currently present in the working computer, the BATCH_SIZE hyperparameter was reduced to just 8 – to avoid the so called "Out of Memory" (OOM) errors witnessed during the initial training sessions of the model. GPU’s VRAM is 10 GB for reference.
In [47]:
vgg_model_v1 = build(
    name="VGG16_v1",
    end_layer_freeze_at=-1,
    specific_layer=None,
    fc_list=[])
VGG16 Layers:
input_2: Trainable? True
block1_conv1: Trainable? True
block1_conv2: Trainable? True
block1_pool: Trainable? True
block2_conv1: Trainable? True
block2_conv2: Trainable? True
block2_pool: Trainable? True
block3_conv1: Trainable? True
block3_conv2: Trainable? True
block3_conv3: Trainable? True
block3_pool: Trainable? True
block4_conv1: Trainable? True
block4_conv2: Trainable? True
block4_conv3: Trainable? True
block4_pool: Trainable? True
block5_conv1: Trainable? True
block5_conv2: Trainable? True
block5_conv3: Trainable? True
block5_pool: Trainable? True
In [48]:
vgg_model_v1.summary()
Model: "VGG16_v1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 model_1 (Functional)        (None, 7, 7, 512)         14714688  
                                                                 
 flatten (Flatten)           (None, 25088)             0         
                                                                 
 dense (Dense)               (None, 1)                 25089     
                                                                 
=================================================================
Total params: 14,739,777
Trainable params: 25,089
Non-trainable params: 14,714,688
_________________________________________________________________

Start training!

In [49]:
## Below is some helper code that will allow you to add checkpoints to your model,
## This will save the 'best' version of your model by comparing it to previous epochs of training

## Note that you need to choose which metric to monitor for your model's 'best' performance if using this code. 
## The 'patience' parameter is set to 10, meaning that your model will train for ten epochs without seeing
## improvement before quitting

# Add checkpoints to your model.
# This will save the 'best' version of your model by comparing it to previous epochs of training.
def add_checkpoints(**vargs):
    
    # Args.
    model = vargs['model']
    
    weight_path = "./out/{}_{}.best.hdf5".format('xray_classification', model.name)

    checkpoint = ModelCheckpoint(
        weight_path,
        monitor='val_loss', 
        mode='min',
        verbose=1,
        save_best_only=True,
        save_weights_only=True)

    callbacks_list = [checkpoint, EarlyStopping(monitor='val_loss', mode='min', patience=10)]
    
    return weight_path, callbacks_list
In [50]:
weight_path, callbacks_list = add_checkpoints(model=vgg_model_v1)
In [51]:
## train your model.
history = vgg_model_v1.fit_generator(
    train_gen, 
    validation_data=(valX, valY), 
    epochs=NUM_EPOCHS,
    callbacks=callbacks_list)
Epoch 1/15
287/287 [==============================] - ETA: 0s - loss: 0.6949 - binary_accuracy: 0.5144
Epoch 00001: val_loss improved from inf to 0.67599, saving model to ./out\xray_classification_VGG16_v1.best.hdf5
287/287 [==============================] - 42s 136ms/step - loss: 0.6949 - binary_accuracy: 0.5144 - val_loss: 0.6760 - val_binary_accuracy: 0.5916
Epoch 2/15
287/287 [==============================] - ETA: 0s - loss: 0.6855 - binary_accuracy: 0.5502
Epoch 00002: val_loss improved from 0.67599 to 0.65266, saving model to ./out\xray_classification_VGG16_v1.best.hdf5
287/287 [==============================] - 33s 114ms/step - loss: 0.6855 - binary_accuracy: 0.5502 - val_loss: 0.6527 - val_binary_accuracy: 0.6399
Epoch 3/15
287/287 [==============================] - ETA: 0s - loss: 0.6805 - binary_accuracy: 0.5624
Epoch 00003: val_loss did not improve from 0.65266
287/287 [==============================] - 33s 114ms/step - loss: 0.6805 - binary_accuracy: 0.5624 - val_loss: 0.6888 - val_binary_accuracy: 0.5538
Epoch 4/15
287/287 [==============================] - ETA: 0s - loss: 0.6723 - binary_accuracy: 0.5895
Epoch 00004: val_loss improved from 0.65266 to 0.61114, saving model to ./out\xray_classification_VGG16_v1.best.hdf5
287/287 [==============================] - 33s 115ms/step - loss: 0.6723 - binary_accuracy: 0.5895 - val_loss: 0.6111 - val_binary_accuracy: 0.6944
Epoch 5/15
287/287 [==============================] - ETA: 0s - loss: 0.6717 - binary_accuracy: 0.5930
Epoch 00005: val_loss did not improve from 0.61114
287/287 [==============================] - 33s 115ms/step - loss: 0.6717 - binary_accuracy: 0.5930 - val_loss: 0.6433 - val_binary_accuracy: 0.6203
Epoch 6/15
287/287 [==============================] - ETA: 0s - loss: 0.6704 - binary_accuracy: 0.5952
Epoch 00006: val_loss did not improve from 0.61114
287/287 [==============================] - 33s 115ms/step - loss: 0.6704 - binary_accuracy: 0.5952 - val_loss: 0.6565 - val_binary_accuracy: 0.6035
Epoch 7/15
287/287 [==============================] - ETA: 0s - loss: 0.6672 - binary_accuracy: 0.5978
Epoch 00007: val_loss did not improve from 0.61114
287/287 [==============================] - 37s 130ms/step - loss: 0.6672 - binary_accuracy: 0.5978 - val_loss: 0.6680 - val_binary_accuracy: 0.5867
Epoch 8/15
287/287 [==============================] - ETA: 0s - loss: 0.6608 - binary_accuracy: 0.6061
Epoch 00008: val_loss did not improve from 0.61114
287/287 [==============================] - 33s 115ms/step - loss: 0.6608 - binary_accuracy: 0.6061 - val_loss: 0.6563 - val_binary_accuracy: 0.6063
Epoch 9/15
287/287 [==============================] - ETA: 0s - loss: 0.6625 - binary_accuracy: 0.6066
Epoch 00009: val_loss did not improve from 0.61114
287/287 [==============================] - 33s 116ms/step - loss: 0.6625 - binary_accuracy: 0.6066 - val_loss: 0.6564 - val_binary_accuracy: 0.6056
Epoch 10/15
287/287 [==============================] - ETA: 0s - loss: 0.6623 - binary_accuracy: 0.6009
Epoch 00010: val_loss did not improve from 0.61114
287/287 [==============================] - 33s 115ms/step - loss: 0.6623 - binary_accuracy: 0.6009 - val_loss: 0.6302 - val_binary_accuracy: 0.6413
Epoch 11/15
287/287 [==============================] - ETA: 0s - loss: 0.6591 - binary_accuracy: 0.6083
Epoch 00011: val_loss did not improve from 0.61114
287/287 [==============================] - 33s 114ms/step - loss: 0.6591 - binary_accuracy: 0.6083 - val_loss: 0.6468 - val_binary_accuracy: 0.6168
Epoch 12/15
287/287 [==============================] - ETA: 0s - loss: 0.6557 - binary_accuracy: 0.6131
Epoch 00012: val_loss did not improve from 0.61114
287/287 [==============================] - 33s 115ms/step - loss: 0.6557 - binary_accuracy: 0.6131 - val_loss: 0.6719 - val_binary_accuracy: 0.5755
Epoch 13/15
287/287 [==============================] - ETA: 0s - loss: 0.6539 - binary_accuracy: 0.6170
Epoch 00013: val_loss did not improve from 0.61114
287/287 [==============================] - 33s 115ms/step - loss: 0.6539 - binary_accuracy: 0.6170 - val_loss: 0.6189 - val_binary_accuracy: 0.6678
Epoch 14/15
287/287 [==============================] - ETA: 0s - loss: 0.6549 - binary_accuracy: 0.6223
Epoch 00014: val_loss did not improve from 0.61114
287/287 [==============================] - 33s 115ms/step - loss: 0.6549 - binary_accuracy: 0.6223 - val_loss: 0.6418 - val_binary_accuracy: 0.6196
In [52]:
evaluation = vgg_model_v1.evaluate(train_gen)

print(f"Train loss: {evaluation[0] * 100:.2f}%")
print(f"Train accuracy: {evaluation[1] * 100:.2f}%")
287/287 [==============================] - 32s 111ms/step - loss: 0.6538 - binary_accuracy: 0.6223
Train loss: 65.38%
Train accuracy: 62.23%
After training for some time, look at the performance of your model by plotting some performance statistics:

Note, these figures will come in handy for your FDA documentation later in the project.

In [53]:
## Log the performance statistics.
performance = []
In [54]:
history_df = pd.DataFrame(history.history)

performance.append(history_df[history_df['val_loss']==min(history_df['val_loss'])])
performance
Out[54]:
[       loss  binary_accuracy  val_loss  val_binary_accuracy
 3  0.672288          0.58952  0.611138             0.694406]
In [55]:
## After training, make some predictions to assess your model's overall performance.
## Note that detecting pneumonia is hard even for trained expert radiologists, 
## so there is no need to make the model perfect.
vgg_model_v1.load_weights(weight_path)
pred_Y = vgg_model_v1.predict(valX, batch_size=BATCH_SIZE, verbose=True)
179/179 [==============================] - 3s 16ms/step
In [56]:
pred_Y.shape
Out[56]:
(1430, 1)
In [57]:
## Save model history (i.e., train results) to a .csv file.
def save_history_df_to_file(model):
    
    path = "./out/{}_{}.csv".format('history_df', model.name)
    history_df.to_csv(path)
    return
In [58]:
save_history_df_to_file(vgg_model_v1)

Visualization plots (helper functions):

In [59]:
## Hint: can use scikit-learn's built in functions here like roc_curve.

FIG_SIZE = (16, 4)


def plot_auc(model, t_y, p_y):
    
    fpr, tpr, thresholds = roc_curve(t_y, p_y)
    fig, ax = plt.subplots(1, 1, figsize=FIG_SIZE)    
    ax.plot(fpr, tpr, label='AUC = %0.4f' % (auc(fpr, tpr)))
    ax.set_xlabel('False Positive Rate')
    ax.set_ylabel('True Positive Rate') 
    ax.legend(loc='best', fontsize='large')
    ax.set_title('{} ROC Curve'.format(model.name))
    fig.savefig('./out/{}_ROC_Curve.png'.format(model.name), dpi=300)
    return

    
## What other performance statistics do you want to include here besides AUC?

def plot_f1_scores(model, t_y, p_y):
    
    precision, recall, thresholds = precision_recall_curve(t_y, p_y)

    # Compute F1.
    def f1(threshold):

        for i, t in enumerate(thresholds):
            if t > threshold:
                f1_sc = calc_f1_score(precision[i], recall[i])
                if not math.isnan(f1_sc):
                    return f1_sc
                else: 
                    return 0
        return 0

    f1_scores = [f1(t) for t in thresholds]
    idx = np.argmax(np.array(f1_scores, dtype=np.float32))
    
    fig, ax = plt.subplots(1, 1, figsize=FIG_SIZE)    
    ax.plot(thresholds, f1_scores, label='Threshold at MAX F1-score = %.4f' % (thresholds[idx]))
    ax.set_xlabel('Threshold')
    ax.set_ylabel('F1-score') 
    ax.legend(loc='best', fontsize='large')
    ax.set_title('{} F1-scores by Threshold'.format(model.name))
    fig.savefig('./out/{}_F1_scores_by_Threshold.png'.format(model.name), dpi=300)
    return

    
def plot_prec_recall(model, t_y, p_y):
    
    precision, recall, thresholds = precision_recall_curve(t_y, p_y)
    fig, ax = plt.subplots(1, 1, figsize=FIG_SIZE)    
    ax.plot(recall, precision)
    ax.set_xlabel('Recall')
    ax.set_ylabel('Precision')
    ax.set_title('{} Precision/Recall Curve'.format(model.name))
    fig.savefig('./out/{}_Precision_Recall_Curve.png'.format(model.name), dpi=300)
    return


def plot_prec_recall_by_threshold(model, t_y, p_y):
    
    plt.figure(figsize=FIG_SIZE)
    precision, recall, thresholds = precision_recall_curve(t_y, p_y, pos_label=1)
    plt.plot(thresholds, precision[:-1], lw=2, label='Precision')
    plt.plot(thresholds, recall[:-1], lw=2, label='Recall')
    plt.legend(loc='best', fontsize='large')
    plt.xlabel('Threshold')
    plt.ylabel('Precision/Recall')
    plt.title('{} Precision/Recall by Threshold'.format(model.name))
    plt.savefig('./out/{}_Precision_Recall_by_Threshold.png'.format(model.name), dpi=300)
    plt.show()

    
def plot_f1_recall_by_threshold(model, t_y, p_y):
    
    # Compute F1.
    def f1(threshold):

        for i, t in enumerate(thresholds):
            if t > threshold:
                f1_sc = calc_f1_score(precision[i], recall[i])
                if not math.isnan(f1_sc):
                    return f1_sc
                else: 
                    return 0
        return 0

    precision, recall, thresholds = precision_recall_curve(valY, pred_Y)
    f1_scores = [f1(t) for t in thresholds]
    
    plt.figure(figsize=FIG_SIZE)
    plt.plot(thresholds, recall[:-1], lw=2, label='Recall')
    plt.plot(thresholds, f1_scores, label='F1-score')
    plt.xlabel('Threshold')
    plt.ylabel('F1-score')
    plt.legend(loc='best', fontsize='large')
    plt.title('{} F1-scores by Threshold'.format(model.name))
    plt.savefig('./out/{}_F1_scores_by_Threshold.png'.format(model.name), dpi=300)
    plt.show()    

    
## Also consider plotting the history of your model training:

def plot_acc(model, history):
    
    x_vals = np.arange(0, len(history.history["loss"]))
    metrics = ["binary_accuracy", "val_binary_accuracy"]
    labels = ["train_acc", "val_acc"]
    plt.figure(figsize=FIG_SIZE)
    for i, metric in enumerate(metrics):
        plt.plot(x_vals,
                 history.history[metric],
                 label=labels[i])
    plt.xlabel("Epoch")
    plt.ylabel("Accuracy")
    plt.legend(loc="best", fontsize='large')
    plt.title('{} Training Evolution (Accuracy)'.format(model.name))
    plt.savefig('./out/{}_Training_Evolution_Accuracy.png'.format(model.name), dpi=300)
    plt.show()

    
def plot_loss(model, history):
    
    x_vals = np.arange(0, len(history.history["loss"]))
    metrics = ["loss", "val_loss"]
    labels = ["train_loss", "val_loss", "train_acc", "val_acc"]
    plt.figure(figsize=FIG_SIZE)
    for i, metric in enumerate(metrics):
        plt.plot(x_vals,
                 history.history[metric],
                 label=labels[i])
    plt.xlabel("Epoch")
    plt.ylabel("Loss")
    plt.legend(loc="best", fontsize='large')
    plt.title('{} Training Evolution (Losses)'.format(model.name))
    plt.savefig('./out/{}_Training_Evolution_Losses.png'.format(model.name), dpi=300)
    plt.show()
    
    
## Plot the history of your model training.
def plot_history(model, history):
    
    plot_acc(model, history)
    plot_loss(model, history)
In [60]:
# Calculate the F1-score metric.
def calc_f1_score(prec, recall):
    
    return (2. * (prec*recall) / (prec+recall))


# Obtain the threshold at maximum f1_score => best threshold.
# This function finds the threshold at maximum f1_score computed from precision and recall values.
# It attepts to log the threshold and the corresponding precision and recall values.
def calc_best_threshold(precision, recall):
    
    f1_score = calc_f1_score(precision, recall)
    
    # Find and save the index (idx_max) at maximum f1_score.
    idx_max = np.nanargmax(f1_score)
    best_threshold = thresholds[idx_max]
    
    # Log the corresponding threshold, precision and recall values.
    print(f'Threshold that maximized the F1-score: %.4f' % best_threshold)
    print(f'Corresponding Precision: %.2f' % precision[idx_max])
    print(f'Recall: %.2f' % recall[idx_max])
    
    return best_threshold
In [61]:
## Plot all figures.
plot_history(vgg_model_v1, history)
plot_auc(vgg_model_v1, valY, pred_Y)
plot_f1_scores(vgg_model_v1, valY, pred_Y)
plot_prec_recall(vgg_model_v1, valY, pred_Y)
plot_prec_recall_by_threshold(vgg_model_v1, valY, pred_Y)
plot_f1_recall_by_threshold(vgg_model_v1, valY, pred_Y)
In [62]:
precision, recall, thresholds = precision_recall_curve(valY, pred_Y)

# Compute, and return the F1-score.
def f1(threshold):

    for i, t in enumerate(thresholds):
        if t > threshold:
            f1_sc = calc_f1_score(precision[i], recall[i])
            if not math.isnan(f1_sc):
                return f1_sc
            else: 
                return 0
    return 0

f1_scores = [f1(t) for t in thresholds]
idx = np.argmax(np.array(f1_scores, dtype=np.float32))
f1 = f1_scores[idx]

print(f'Best/Max F1-score: {f1: .4f}\nThreshold at MAX F1-score: {thresholds[idx]: .4f}\nPecision: {precision[idx]: .4f}\nRecall: {recall[idx]: .4f}')
Best/Max F1-score:  0.3790
Threshold at MAX F1-score:  0.4140
Pecision:  0.2569
Recall:  0.7203
In [63]:
## Get the index at a given threshold.
def get_idx(threshold):
    
    for idx, t in enumerate(thresholds):
        if t > threshold:
            return idx

print(f'Precision: {precision[get_idx(.29)]: .3f}, Recall: {recall[get_idx(.29)]: .3f}, F1-score: {f1_scores[get_idx(.29)]: .3f}, Threshold: {thresholds[get_idx(.29)]: .3f}')
print(f'Precision: {precision[get_idx(.3)]: .3f}, Recall: {recall[get_idx(.3)]: .3f}, F1-score: {f1_scores[get_idx(.3)]: .3f}, Threshold: {thresholds[get_idx(.3)]: .3f}')
print(f'Precision: {precision[get_idx(.31)]: .3f}, Recall: {recall[get_idx(.31)]: .3f}, F1-score: {f1_scores[get_idx(.31)]: .3f}, Threshold: {thresholds[get_idx(.31)]: .3f}')
print(f'Precision: {precision[get_idx(.35)]: .3f}, Recall: {recall[get_idx(.35)]: .3f}, F1-score: {f1_scores[get_idx(.35)]: .3f}, Threshold: {thresholds[get_idx(.35)]: .3f}')
Precision:  0.206, Recall:  0.993, F1-score:  0.342, Threshold:  0.291
Precision:  0.210, Recall:  0.990, F1-score:  0.346, Threshold:  0.300
Precision:  0.211, Recall:  0.976, F1-score:  0.347, Threshold:  0.310
Precision:  0.224, Recall:  0.888, F1-score:  0.359, Threshold:  0.350
In [64]:
def log_confusion_matrix_results():
    
    tn, fp, fn, tp = confusion_matrix(valY, (pred_Y > .3).astype(int)).ravel()
    print(f'True Negatives: {tn: .0f}')
    print(f'False Positives: {fp: .0f}')
    print(f'False Negatives: {fn: .0f}')
    print(f'True Positives: {tp: .0f}')
    
log_confusion_matrix_results()
True Negatives:  77
False Positives:  1067
False Negatives:  3
True Positives:  283
In [65]:
## Just save model architecture to a .json:
def save_model_architecture(model):
    
    model_json = model.to_json()
    with open("./out/{}.json".format(model.name), "w") as json_file:
        json_file.write(model_json)
    return
In [66]:
## Save model architecture.
save_model_architecture(model=vgg_model_v1)

Feedback

  • Model 1’s architecture doesn’t seem to be promising since its accuracy is scored at a little over sixty percent (exactly 62.23%).
  • As we look at the graphs (especially both accuracy and loss evolutions) we see that the performance is clearly unstable (with little learning after the training cycle).
  • At this stage one might consider adding Dropouts and Fully Connected (FC) layers to the model and see if this update to the architecture might yield promising results, when compared to the outputs revealed above.

Model 2 (VGG16_v2)

About

  • Like Model 1, Model 2's architecture uses the last output layer from the downloaded VGG16 pretrained model. However, the difference here is in the addition of one dropout layer to the model (i.e., by specifying fc_list=[1024]), consequently including one FC-layer into the model architecture since the length of the list is 1 (see the definition of build(**vargs) for details).
  • Like Model 1, Model 2 is assumed to run for 15 epochs to observe the learning outcomes.
  • Due to a VRAM limitation currently present in the working computer, the BATCH_SIZE hyperparameter was reduced to just 8 – to avoid OOM errors witnessed during the initial training sessions of the model. GPU’s VRAM is 10 GB for reference.
In [67]:
vgg_model_v2 = build(
    name="VGG16_v2",
    end_layer_freeze_at=-1,
    specific_layer=None,
    fc_list=[1024])
VGG16 Layers:
input_3: Trainable? True
block1_conv1: Trainable? True
block1_conv2: Trainable? True
block1_pool: Trainable? True
block2_conv1: Trainable? True
block2_conv2: Trainable? True
block2_pool: Trainable? True
block3_conv1: Trainable? True
block3_conv2: Trainable? True
block3_conv3: Trainable? True
block3_pool: Trainable? True
block4_conv1: Trainable? True
block4_conv2: Trainable? True
block4_conv3: Trainable? True
block4_pool: Trainable? True
block5_conv1: Trainable? True
block5_conv2: Trainable? True
block5_conv3: Trainable? True
block5_pool: Trainable? True
In [68]:
vgg_model_v2.summary()
Model: "VGG16_v2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 model_2 (Functional)        (None, 7, 7, 512)         14714688  
                                                                 
 flatten_1 (Flatten)         (None, 25088)             0         
                                                                 
 dropout (Dropout)           (None, 25088)             0         
                                                                 
 dense_1 (Dense)             (None, 1024)              25691136  
                                                                 
 dense_2 (Dense)             (None, 1)                 1025      
                                                                 
=================================================================
Total params: 40,406,849
Trainable params: 25,692,161
Non-trainable params: 14,714,688
_________________________________________________________________
In [69]:
weight_path, callbacks_list = add_checkpoints(model=vgg_model_v2)
In [70]:
## train your model.
history = vgg_model_v2.fit_generator(
    train_gen, 
    validation_data=(valX, valY), 
    epochs=NUM_EPOCHS,
    callbacks=callbacks_list)
Epoch 1/15
287/287 [==============================] - ETA: 0s - loss: 0.7375 - binary_accuracy: 0.5332
Epoch 00001: val_loss improved from inf to 0.76799, saving model to ./out\xray_classification_VGG16_v2.best.hdf5
287/287 [==============================] - 34s 117ms/step - loss: 0.7375 - binary_accuracy: 0.5332 - val_loss: 0.7680 - val_binary_accuracy: 0.4476
Epoch 2/15
287/287 [==============================] - ETA: 0s - loss: 0.6943 - binary_accuracy: 0.5769
Epoch 00002: val_loss improved from 0.76799 to 0.72777, saving model to ./out\xray_classification_VGG16_v2.best.hdf5
287/287 [==============================] - 34s 117ms/step - loss: 0.6943 - binary_accuracy: 0.5769 - val_loss: 0.7278 - val_binary_accuracy: 0.5350
Epoch 3/15
287/287 [==============================] - ETA: 0s - loss: 0.6834 - binary_accuracy: 0.5838
Epoch 00003: val_loss improved from 0.72777 to 0.61447, saving model to ./out\xray_classification_VGG16_v2.best.hdf5
287/287 [==============================] - 34s 117ms/step - loss: 0.6834 - binary_accuracy: 0.5838 - val_loss: 0.6145 - val_binary_accuracy: 0.6727
Epoch 4/15
287/287 [==============================] - ETA: 0s - loss: 0.6788 - binary_accuracy: 0.5865
Epoch 00004: val_loss improved from 0.61447 to 0.53757, saving model to ./out\xray_classification_VGG16_v2.best.hdf5
287/287 [==============================] - 33s 115ms/step - loss: 0.6788 - binary_accuracy: 0.5865 - val_loss: 0.5376 - val_binary_accuracy: 0.7413
Epoch 5/15
287/287 [==============================] - ETA: 0s - loss: 0.6722 - binary_accuracy: 0.5961
Epoch 00005: val_loss did not improve from 0.53757
287/287 [==============================] - 33s 114ms/step - loss: 0.6722 - binary_accuracy: 0.5961 - val_loss: 0.6317 - val_binary_accuracy: 0.6413
Epoch 6/15
287/287 [==============================] - ETA: 0s - loss: 0.6717 - binary_accuracy: 0.6114
Epoch 00006: val_loss did not improve from 0.53757
287/287 [==============================] - 33s 115ms/step - loss: 0.6717 - binary_accuracy: 0.6114 - val_loss: 0.6821 - val_binary_accuracy: 0.5930
Epoch 7/15
287/287 [==============================] - ETA: 0s - loss: 0.6731 - binary_accuracy: 0.5943
Epoch 00007: val_loss did not improve from 0.53757
287/287 [==============================] - 33s 116ms/step - loss: 0.6731 - binary_accuracy: 0.5943 - val_loss: 0.5839 - val_binary_accuracy: 0.6881
Epoch 8/15
287/287 [==============================] - ETA: 0s - loss: 0.6703 - binary_accuracy: 0.6013
Epoch 00008: val_loss did not improve from 0.53757
287/287 [==============================] - 33s 115ms/step - loss: 0.6703 - binary_accuracy: 0.6013 - val_loss: 0.5431 - val_binary_accuracy: 0.7378
Epoch 9/15
287/287 [==============================] - ETA: 0s - loss: 0.6583 - binary_accuracy: 0.6131
Epoch 00009: val_loss did not improve from 0.53757
287/287 [==============================] - 33s 116ms/step - loss: 0.6583 - binary_accuracy: 0.6131 - val_loss: 0.6749 - val_binary_accuracy: 0.6028
Epoch 10/15
287/287 [==============================] - ETA: 0s - loss: 0.6605 - binary_accuracy: 0.6140
Epoch 00010: val_loss improved from 0.53757 to 0.53088, saving model to ./out\xray_classification_VGG16_v2.best.hdf5
287/287 [==============================] - 33s 115ms/step - loss: 0.6605 - binary_accuracy: 0.6140 - val_loss: 0.5309 - val_binary_accuracy: 0.7427
Epoch 11/15
287/287 [==============================] - ETA: 0s - loss: 0.6577 - binary_accuracy: 0.6140
Epoch 00011: val_loss did not improve from 0.53088
287/287 [==============================] - 33s 115ms/step - loss: 0.6577 - binary_accuracy: 0.6140 - val_loss: 0.5740 - val_binary_accuracy: 0.6993
Epoch 12/15
287/287 [==============================] - ETA: 0s - loss: 0.6567 - binary_accuracy: 0.6231
Epoch 00012: val_loss did not improve from 0.53088
287/287 [==============================] - 33s 114ms/step - loss: 0.6567 - binary_accuracy: 0.6231 - val_loss: 0.5463 - val_binary_accuracy: 0.7280
Epoch 13/15
287/287 [==============================] - ETA: 0s - loss: 0.6513 - binary_accuracy: 0.6279
Epoch 00013: val_loss did not improve from 0.53088
287/287 [==============================] - 33s 115ms/step - loss: 0.6513 - binary_accuracy: 0.6279 - val_loss: 0.7302 - val_binary_accuracy: 0.5601
Epoch 14/15
287/287 [==============================] - ETA: 0s - loss: 0.6469 - binary_accuracy: 0.6297
Epoch 00014: val_loss did not improve from 0.53088
287/287 [==============================] - 33s 114ms/step - loss: 0.6469 - binary_accuracy: 0.6297 - val_loss: 0.8066 - val_binary_accuracy: 0.4958
Epoch 15/15
287/287 [==============================] - ETA: 0s - loss: 0.6533 - binary_accuracy: 0.6192
Epoch 00015: val_loss did not improve from 0.53088
287/287 [==============================] - 33s 114ms/step - loss: 0.6533 - binary_accuracy: 0.6192 - val_loss: 0.5982 - val_binary_accuracy: 0.6713
In [71]:
evaluation = vgg_model_v2.evaluate(train_gen)

print(f"Train loss: {evaluation[0] * 100:.2f}%")
print(f"Train accuracy: {evaluation[1] * 100:.2f}%")
287/287 [==============================] - 31s 109ms/step - loss: 0.6232 - binary_accuracy: 0.6472
Train loss: 62.32%
Train accuracy: 64.72%

After training, wel look at the performance of this model by plotting some performance statistics.

Again, these figures will come in handy for the FDA documentation later in the project.

In [72]:
history_df = pd.DataFrame(history.history)

performance.append(history_df[history_df['val_loss']==min(history_df['val_loss'])])
performance
Out[72]:
[       loss  binary_accuracy  val_loss  val_binary_accuracy
 3  0.672288          0.58952  0.611138             0.694406,
        loss  binary_accuracy  val_loss  val_binary_accuracy
 9  0.660489         0.613974   0.53088             0.742657]
In [73]:
## After training, make some predictions to assess your model's overall performance.
## Note that detecting pneumonia is hard even for trained expert radiologists, 
## so there is no need to make the model perfect.
vgg_model_v2.load_weights(weight_path)
pred_Y = vgg_model_v2.predict(valX, batch_size=BATCH_SIZE, verbose=True)
179/179 [==============================] - 2s 13ms/step
In [74]:
pred_Y.shape
Out[74]:
(1430, 1)
In [75]:
save_history_df_to_file(vgg_model_v2)
In [76]:
## Plot all figures.
plot_history(vgg_model_v2, history)
plot_auc(vgg_model_v2, valY, pred_Y)
plot_f1_scores(vgg_model_v2, valY, pred_Y)
plot_prec_recall(vgg_model_v2, valY, pred_Y)
plot_prec_recall_by_threshold(vgg_model_v2, valY, pred_Y)
plot_f1_recall_by_threshold(vgg_model_v2, valY, pred_Y)
In [77]:
precision, recall, thresholds = precision_recall_curve(valY, pred_Y)

# Compute, and return the F1-score.
def f1(threshold):

    for i, t in enumerate(thresholds):
        if t > threshold:
            f1_sc = calc_f1_score(precision[i], recall[i])
            if not math.isnan(f1_sc):
                return f1_sc
            else: 
                return 0
    return 0

f1_scores = [f1(t) for t in thresholds]
idx = np.argmax(np.array(f1_scores, dtype=np.float32))
f1 = f1_scores[idx]

print(f'Best/Max F1-score: {f1: .4f}\nThreshold at MAX F1-score: {thresholds[idx]: .4f}\nPecision: {precision[idx]: .4f}\nRecall: {recall[idx]: .4f}')
Best/Max F1-score:  0.4166
Threshold at MAX F1-score:  0.2934
Pecision:  0.2825
Recall:  0.7902
In [78]:
print(f'Precision: {precision[get_idx(.29)]: .3f}, Recall: {recall[get_idx(.29)]: .3f}, F1-score: {f1_scores[get_idx(.29)]: .3f}, Threshold: {thresholds[get_idx(.29)]: .3f}')
print(f'Precision: {precision[get_idx(.3)]: .3f}, Recall: {recall[get_idx(.3)]: .3f}, F1-score: {f1_scores[get_idx(.3)]: .3f}, Threshold: {thresholds[get_idx(.3)]: .3f}')
print(f'Precision: {precision[get_idx(.31)]: .3f}, Recall: {recall[get_idx(.31)]: .3f}, F1-score: {f1_scores[get_idx(.31)]: .3f}, Threshold: {thresholds[get_idx(.31)]: .3f}')
print(f'Precision: {precision[get_idx(.35)]: .3f}, Recall: {recall[get_idx(.35)]: .3f}, F1-score: {f1_scores[get_idx(.35)]: .3f}, Threshold: {thresholds[get_idx(.35)]: .3f}')
Precision:  0.279, Recall:  0.790, F1-score:  0.413, Threshold:  0.290
Precision:  0.281, Recall:  0.762, F1-score:  0.411, Threshold:  0.300
Precision:  0.283, Recall:  0.748, F1-score:  0.411, Threshold:  0.310
Precision:  0.293, Recall:  0.661, F1-score:  0.406, Threshold:  0.352
In [79]:
log_confusion_matrix_results()
True Negatives:  585
False Positives:  559
False Negatives:  68
True Positives:  218
In [80]:
## Save model architecture.
save_model_architecture(model=vgg_model_v2)

Feedback

  • This is an interesting situation here, especially after finding out that the simple update to Model 1’s architecture resulted in a relatively higher accuracy of 64.72% (which is a small +2.49% delta) for Model 2.
  • The Accuracy/Loss graphs plotted for Model 2 showed a steadiness in training evolution over time. Compared to Model 1’s plots, Model 2 seems to be slightly better and has outperformed Model 1 by a little (yet still not convincing for being a trustworthy model for general use due to being lower than eighty percent in general). Nevertheless, this finding suggests such a modification is indeed promising for further experimentations.
  • At this stage one might consider adding more Dropouts and Fully Connected (FC) layers to the model and see if this update to the architecture might yield promising results, when compared to the outputs revealed above.
In [81]:
## We will find the threshold that optimize this model's performance,
## and use it to make binary classifications later (see below).\

# Precision and Recall are already obtained previously.
BEST_THRESHOLD_MODEL_2 = calc_best_threshold(precision, recall)

print(f'{vgg_model_v2.name}\'s best/max threshold: {BEST_THRESHOLD_MODEL_2: .4f}')
Threshold that maximized the F1-score: 0.2936
Corresponding Precision: 0.28
Recall: 0.79
VGG16_v2's best/max threshold:  0.2936
In [82]:
## Saved for later use:
pred_Y_vgg_model_v2 = pred_Y

Model 3 (VGG16_v3)

About

  • Like Model 2, Model 3's architecture uses the last output layer from the downloaded VGG16 pretrained model. However, the difference here is in the addition of four dropout layers to the model (i.e., by specifying fc_list=[1024, 512, 256, 128]), consequently including foru FC-layers into the model architecture since the length of the list is 4 (see the definition of build(**vargs) for details).
  • Like Model 2, Model 3 is assumed to run for 15 epochs to observe the learning outcomes.
  • Due to a VRAM limitation currently present in the working computer, the BATCH_SIZE hyperparameter was reduced to just 8 – to avoid OOM errors witnessed during the initial training sessions of the model. GPU’s VRAM is 10 GB for reference.
In [83]:
vgg_model_v3 = build(
    name="VGG16_v3",
    end_layer_freeze_at=-1,
    specific_layer=None,
    fc_list=[1024, 512, 256, 128])
VGG16 Layers:
input_4: Trainable? True
block1_conv1: Trainable? True
block1_conv2: Trainable? True
block1_pool: Trainable? True
block2_conv1: Trainable? True
block2_conv2: Trainable? True
block2_pool: Trainable? True
block3_conv1: Trainable? True
block3_conv2: Trainable? True
block3_conv3: Trainable? True
block3_pool: Trainable? True
block4_conv1: Trainable? True
block4_conv2: Trainable? True
block4_conv3: Trainable? True
block4_pool: Trainable? True
block5_conv1: Trainable? True
block5_conv2: Trainable? True
block5_conv3: Trainable? True
block5_pool: Trainable? True
In [84]:
vgg_model_v3.summary()
Model: "VGG16_v3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 model_3 (Functional)        (None, 7, 7, 512)         14714688  
                                                                 
 flatten_2 (Flatten)         (None, 25088)             0         
                                                                 
 dropout_1 (Dropout)         (None, 25088)             0         
                                                                 
 dense_3 (Dense)             (None, 1024)              25691136  
                                                                 
 dropout_2 (Dropout)         (None, 1024)              0         
                                                                 
 dense_4 (Dense)             (None, 512)               524800    
                                                                 
 dropout_3 (Dropout)         (None, 512)               0         
                                                                 
 dense_5 (Dense)             (None, 256)               131328    
                                                                 
 dropout_4 (Dropout)         (None, 256)               0         
                                                                 
 dense_6 (Dense)             (None, 128)               32896     
                                                                 
 dense_7 (Dense)             (None, 1)                 129       
                                                                 
=================================================================
Total params: 41,094,977
Trainable params: 26,380,289
Non-trainable params: 14,714,688
_________________________________________________________________
In [85]:
weight_path, callbacks_list = add_checkpoints(model=vgg_model_v3)
In [86]:
## train your model.
history = vgg_model_v3.fit_generator(
    train_gen, 
    validation_data=(valX, valY), 
    epochs=NUM_EPOCHS,
    callbacks=callbacks_list)
Epoch 1/15
287/287 [==============================] - ETA: 0s - loss: 0.8288 - binary_accuracy: 0.5022
Epoch 00001: val_loss improved from inf to 0.67793, saving model to ./out\xray_classification_VGG16_v3.best.hdf5
287/287 [==============================] - 34s 116ms/step - loss: 0.8288 - binary_accuracy: 0.5022 - val_loss: 0.6779 - val_binary_accuracy: 0.6909
Epoch 2/15
287/287 [==============================] - ETA: 0s - loss: 0.7809 - binary_accuracy: 0.4978
Epoch 00002: val_loss did not improve from 0.67793
287/287 [==============================] - 33s 116ms/step - loss: 0.7809 - binary_accuracy: 0.4978 - val_loss: 0.7160 - val_binary_accuracy: 0.2510
Epoch 3/15
287/287 [==============================] - ETA: 0s - loss: 0.7659 - binary_accuracy: 0.5000
Epoch 00003: val_loss did not improve from 0.67793
287/287 [==============================] - 33s 116ms/step - loss: 0.7659 - binary_accuracy: 0.5000 - val_loss: 0.7084 - val_binary_accuracy: 0.3070
Epoch 4/15
287/287 [==============================] - ETA: 0s - loss: 0.7474 - binary_accuracy: 0.5223
Epoch 00004: val_loss did not improve from 0.67793
287/287 [==============================] - 33s 115ms/step - loss: 0.7474 - binary_accuracy: 0.5223 - val_loss: 0.7088 - val_binary_accuracy: 0.2874
Epoch 5/15
287/287 [==============================] - ETA: 0s - loss: 0.7464 - binary_accuracy: 0.4921
Epoch 00005: val_loss did not improve from 0.67793
287/287 [==============================] - 33s 115ms/step - loss: 0.7464 - binary_accuracy: 0.4921 - val_loss: 0.7042 - val_binary_accuracy: 0.3483
Epoch 6/15
287/287 [==============================] - ETA: 0s - loss: 0.7411 - binary_accuracy: 0.5000
Epoch 00006: val_loss did not improve from 0.67793
287/287 [==============================] - 33s 114ms/step - loss: 0.7411 - binary_accuracy: 0.5000 - val_loss: 0.6895 - val_binary_accuracy: 0.5678
Epoch 7/15
287/287 [==============================] - ETA: 0s - loss: 0.7350 - binary_accuracy: 0.4987
Epoch 00007: val_loss did not improve from 0.67793
287/287 [==============================] - 33s 114ms/step - loss: 0.7350 - binary_accuracy: 0.4987 - val_loss: 0.7010 - val_binary_accuracy: 0.4105
Epoch 8/15
287/287 [==============================] - ETA: 0s - loss: 0.7206 - binary_accuracy: 0.5131
Epoch 00008: val_loss improved from 0.67793 to 0.67045, saving model to ./out\xray_classification_VGG16_v3.best.hdf5
287/287 [==============================] - 33s 115ms/step - loss: 0.7206 - binary_accuracy: 0.5131 - val_loss: 0.6704 - val_binary_accuracy: 0.7455
Epoch 9/15
287/287 [==============================] - ETA: 0s - loss: 0.7254 - binary_accuracy: 0.5192
Epoch 00009: val_loss did not improve from 0.67045
287/287 [==============================] - 33s 115ms/step - loss: 0.7254 - binary_accuracy: 0.5192 - val_loss: 0.6879 - val_binary_accuracy: 0.5636
Epoch 10/15
287/287 [==============================] - ETA: 0s - loss: 0.7211 - binary_accuracy: 0.5109
Epoch 00010: val_loss did not improve from 0.67045
287/287 [==============================] - 33s 116ms/step - loss: 0.7211 - binary_accuracy: 0.5109 - val_loss: 0.6921 - val_binary_accuracy: 0.5224
Epoch 11/15
287/287 [==============================] - ETA: 0s - loss: 0.7124 - binary_accuracy: 0.5218
Epoch 00011: val_loss did not improve from 0.67045
287/287 [==============================] - 33s 116ms/step - loss: 0.7124 - binary_accuracy: 0.5218 - val_loss: 0.6725 - val_binary_accuracy: 0.6699
Epoch 12/15
287/287 [==============================] - ETA: 0s - loss: 0.7069 - binary_accuracy: 0.5293
Epoch 00012: val_loss improved from 0.67045 to 0.66826, saving model to ./out\xray_classification_VGG16_v3.best.hdf5
287/287 [==============================] - 34s 118ms/step - loss: 0.7069 - binary_accuracy: 0.5293 - val_loss: 0.6683 - val_binary_accuracy: 0.6552
Epoch 13/15
287/287 [==============================] - ETA: 0s - loss: 0.7151 - binary_accuracy: 0.5070
Epoch 00013: val_loss did not improve from 0.66826
287/287 [==============================] - 33s 117ms/step - loss: 0.7151 - binary_accuracy: 0.5070 - val_loss: 0.6779 - val_binary_accuracy: 0.6469
Epoch 14/15
287/287 [==============================] - ETA: 0s - loss: 0.7185 - binary_accuracy: 0.5057
Epoch 00014: val_loss did not improve from 0.66826
287/287 [==============================] - 33s 116ms/step - loss: 0.7185 - binary_accuracy: 0.5057 - val_loss: 0.6791 - val_binary_accuracy: 0.6266
Epoch 15/15
287/287 [==============================] - ETA: 0s - loss: 0.7068 - binary_accuracy: 0.5358
Epoch 00015: val_loss did not improve from 0.66826
287/287 [==============================] - 33s 116ms/step - loss: 0.7068 - binary_accuracy: 0.5358 - val_loss: 0.6684 - val_binary_accuracy: 0.6720
In [87]:
evaluation = vgg_model_v3.evaluate(train_gen)

print(f"Train loss: {evaluation[0] * 100:.2f}%")
print(f"Train accuracy: {evaluation[1] * 100:.2f}%")
287/287 [==============================] - 31s 109ms/step - loss: 0.6856 - binary_accuracy: 0.5721
Train loss: 68.56%
Train accuracy: 57.21%

After training, wel look at the performance of this model by plotting some performance statistics.

Again, these figures will come in handy for the FDA documentation later in the project.

In [88]:
history_df = pd.DataFrame(history.history)

performance.append(history_df[history_df['val_loss']==min(history_df['val_loss'])])
performance
Out[88]:
[       loss  binary_accuracy  val_loss  val_binary_accuracy
 3  0.672288          0.58952  0.611138             0.694406,
        loss  binary_accuracy  val_loss  val_binary_accuracy
 9  0.660489         0.613974   0.53088             0.742657,
         loss  binary_accuracy  val_loss  val_binary_accuracy
 11  0.706919         0.529258  0.668257             0.655245]
In [89]:
## After training, make some predictions to assess your model's overall performance.
## Note that detecting pneumonia is hard even for trained expert radiologists, 
## so there is no need to make the model perfect.
vgg_model_v3.load_weights(weight_path)
pred_Y = vgg_model_v3.predict(valX, batch_size=BATCH_SIZE, verbose=True)
179/179 [==============================] - 3s 13ms/step
In [90]:
pred_Y.shape
Out[90]:
(1430, 1)
In [91]:
save_history_df_to_file(vgg_model_v3)
In [92]:
## Plot all figures.
plot_history(vgg_model_v3, history)
plot_auc(vgg_model_v3, valY, pred_Y)
plot_f1_scores(vgg_model_v3, valY, pred_Y)
plot_prec_recall(vgg_model_v3, valY, pred_Y)
plot_prec_recall_by_threshold(vgg_model_v3, valY, pred_Y)
plot_f1_recall_by_threshold(vgg_model_v3, valY, pred_Y)
In [93]:
precision, recall, thresholds = precision_recall_curve(valY, pred_Y)

# Compute, and return the F1-score.
def f1(threshold):

    for i, t in enumerate(thresholds):
        if t > threshold:
            f1_sc = calc_f1_score(precision[i], recall[i])
            if not math.isnan(f1_sc):
                return f1_sc
            else: 
                return 0
    return 0

f1_scores = [f1(t) for t in thresholds]
idx = np.argmax(np.array(f1_scores, dtype=np.float32))
f1 = f1_scores[idx]

print(f'Best/Max F1-score: {f1: .4f}\nThreshold at MAX F1-score: {thresholds[idx]: .4f}\nPecision: {precision[idx]: .4f}\nRecall: {recall[idx]: .4f}')
Best/Max F1-score:  0.3647
Threshold at MAX F1-score:  0.4679
Pecision:  0.2377
Recall:  0.7797
In [94]:
print(f'Precision: {precision[get_idx(.29)]: .3f}, Recall: {recall[get_idx(.29)]: .3f}, F1-score: {f1_scores[get_idx(.29)]: .3f}, Threshold: {thresholds[get_idx(.29)]: .3f}')
print(f'Precision: {precision[get_idx(.3)]: .3f}, Recall: {recall[get_idx(.3)]: .3f}, F1-score: {f1_scores[get_idx(.3)]: .3f}, Threshold: {thresholds[get_idx(.3)]: .3f}')
print(f'Precision: {precision[get_idx(.31)]: .3f}, Recall: {recall[get_idx(.31)]: .3f}, F1-score: {f1_scores[get_idx(.31)]: .3f}, Threshold: {thresholds[get_idx(.31)]: .3f}')
print(f'Precision: {precision[get_idx(.35)]: .3f}, Recall: {recall[get_idx(.35)]: .3f}, F1-score: {f1_scores[get_idx(.35)]: .3f}, Threshold: {thresholds[get_idx(.35)]: .3f}')
Precision:  0.204, Recall:  1.000, F1-score:  0.337, Threshold:  0.439
Precision:  0.204, Recall:  1.000, F1-score:  0.337, Threshold:  0.439
Precision:  0.204, Recall:  1.000, F1-score:  0.337, Threshold:  0.439
Precision:  0.204, Recall:  1.000, F1-score:  0.337, Threshold:  0.439
In [95]:
log_confusion_matrix_results()
True Negatives:  0
False Positives:  1144
False Negatives:  0
True Positives:  286
In [96]:
## Save model architecture.
save_model_architecture(model=vgg_model_v3)

Feedback

  • The results seen here after Model 3’s training cycle are quite unexpected and are disappointing when compared to the rest. In fact, we see a relatively lower train accuracy for *Model 3 (measured at 57.21%). With respect to Model 2**, the decrease delta is measured at -7.51%, which is significant in the evolutions revealed.
  • Model 3’s architecture seems to be the worst of the bunch. Since Model 2 outperformed the other two, it was eventually chosen for the steps proceeding Model 3’s results.
  • It seems that the addition of the Dropout layers, and the Fully Connected (FC) layers to the model didn’t do justice. Perhaps, in later runs ones might consider other variations to the architecture (e.g., by adjusting the number of nodes in the single layer of Model 2, or by adjusting the dropout threshold – which was set to a fixed 50% for both Models 2 and 3. Other approaches might include different pretrained models, in addition to various alterations to the common architectures across all models presented here.

Once you feel you are done training, you'll need to decide the proper classification threshold that optimizes your model's performance for a given metric (e.g. accuracy, F1, precision, etc. You decide).

Model 2 is the Chosen One

In [97]:
## Find the threshold that optimize your model's performance,
## and use that threshold to make binary classification. Make sure you take all your metrics into consideration.

# Todo
BEST_THRESHOLD = BEST_THRESHOLD_MODEL_2
testY = valY
In [98]:
## Let's look at some examples of predicted vs. true with our best model: 

# Todo
fig, m_axs = plt.subplots(10, 10, figsize=(16,16))
i = 0
for (c_x, c_y, c_ax) in zip(valX[0:100], testY[0:100], m_axs.flatten()):
    c_ax.imshow(c_x[:,:,0], cmap='bone')
    if c_y == 1: 
        if pred_Y[i] > BEST_THRESHOLD:
            c_ax.set_title('1, 1')
        else:
            c_ax.set_title('1, 0')
    else:
        if pred_Y[i] > BEST_THRESHOLD: 
            c_ax.set_title('0, 1')
        else:
            c_ax.set_title('0, 0')
    c_ax.axis('off')
    i=i+1
fig.savefig('./out/examples_of_predicted_vs_true_cases_with_{}.png'.format(vgg_model_v2.name))
In [99]:
## Save model architecture.
save_model_architecture(model=vgg_model_v3)
In [112]:
val_data['predicted'] = pred_Y_vgg_model_v2
val_data['predictions'] = val_data.apply(
    lambda df: 1. if df['predicted'] >= BEST_THRESHOLD else .0, axis=1)

val_data.sample(20, random_state=RND_STATE)
Out[112]:
Image Index Finding Labels Follow-up # Patient ID Patient Age Patient Gender View Position OriginalImage[Width Height] OriginalImagePixelSpacing[x ... Nodule Effusion Hernia Emphysema Mass No Finding Atelectasis pneumonia_class predicted predictions
103789 00027706_022.png Infiltration|Pneumonia 22 27706 36 M AP 3056 2544 0.139000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Positive 0.249846 0.0
101037 00026828_000.png No Finding 0 26828 52 M PA 2992 2991 0.143000 ... 0.0 0.0 0.0 0.0 0.0 1.0 0.0 Negative 0.276821 0.0
108526 00029437_002.png Pneumothorax 2 29437 48 M PA 2021 2021 0.194311 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Negative 0.159863 0.0
42621 00010975_000.png No Finding 0 10975 50 M PA 2500 2048 0.168000 ... 0.0 0.0 0.0 0.0 0.0 1.0 0.0 Negative 0.411251 1.0
50393 00012758_001.png Effusion|Pleural_Thickening 1 12758 9 M PA 2478 2509 0.143000 ... 0.0 1.0 0.0 0.0 0.0 0.0 0.0 Negative 0.385716 1.0
25869 00006804_001.png Atelectasis 1 6804 57 F PA 2021 2021 0.194311 ... 0.0 0.0 0.0 0.0 0.0 0.0 1.0 Negative 0.436697 1.0
70805 00017464_003.png No Finding 3 17464 45 F PA 2992 2991 0.143000 ... 0.0 0.0 0.0 0.0 0.0 1.0 0.0 Negative 0.099839 0.0
76197 00018701_000.png Infiltration|Nodule 0 18701 24 M PA 2544 3056 0.139000 ... 1.0 0.0 0.0 0.0 0.0 0.0 0.0 Negative 0.287230 0.0
37390 00009863_035.png Edema|Pneumonia 35 9863 40 F AP 2500 2048 0.168000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Positive 0.337210 1.0
102039 00027168_000.png Atelectasis 0 27168 60 M AP 3056 2544 0.139000 ... 0.0 0.0 0.0 0.0 0.0 0.0 1.0 Negative 0.205198 0.0
75058 00018404_016.png Atelectasis|Consolidation 16 18404 69 F AP 2500 2048 0.168000 ... 0.0 0.0 0.0 0.0 0.0 0.0 1.0 Negative 0.356033 1.0
17639 00004755_001.png No Finding 1 4755 38 F AP 2500 2048 0.168000 ... 0.0 0.0 0.0 0.0 0.0 1.0 0.0 Negative 0.157909 0.0
1612 00000433_000.png No Finding 0 433 50 F PA 2500 2048 0.168000 ... 0.0 0.0 0.0 0.0 0.0 1.0 0.0 Negative 0.196012 0.0
8571 00002270_002.png No Finding 2 2270 64 M PA 2734 2991 0.143000 ... 0.0 0.0 0.0 0.0 0.0 1.0 0.0 Negative 0.279677 0.0
73389 00018055_007.png Pneumothorax 7 18055 64 M AP 3056 2544 0.139000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Negative 0.537377 1.0
89322 00022178_002.png Atelectasis|Infiltration|Pleural_Thickening 2 22178 66 M PA 2992 2991 0.143000 ... 0.0 0.0 0.0 0.0 0.0 0.0 1.0 Negative 0.249860 0.0
38958 00010190_007.png Infiltration|Pneumonia 7 10190 25 M AP 2544 3056 0.139000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Positive 0.290959 0.0
57093 00014177_011.png Effusion|Infiltration|Pneumonia 11 14177 39 M AP 2500 2048 0.168000 ... 0.0 1.0 0.0 0.0 0.0 0.0 0.0 Positive 0.266543 0.0
67725 00016732_027.png Effusion|Infiltration|Pneumonia 27 16732 80 F AP 2500 2048 0.168000 ... 0.0 1.0 0.0 0.0 0.0 0.0 0.0 Positive 0.425731 1.0
6336 00001706_001.png Atelectasis|Pneumonia 1 1706 55 M PA 2500 2048 0.168000 ... 0.0 0.0 0.0 0.0 0.0 0.0 1.0 Positive 0.677944 1.0

20 rows × 31 columns

In [102]:
## Compute the positives and negatives predicted by the model.
tn, fp, fn, tp = confusion_matrix(
    val_data.Pneumonia.values,
    val_data.predictions.values,
    labels=[0,1]).ravel()

print(f'True Negatives: {tn: .0f}')
print(f'False Positives: {fp: .0f}')
print(f'False Negatives: {fn: .0f}')
print(f'True Positives: {tp: .0f}')
True Negatives:  512
False Positives:  632
False Negatives:  119
True Positives:  167
In [104]:
## Determine pred_Y_vgg_model_v2's performance without other diseases.
sensitivity = tp / (tp+fn)
specificity = tn / (tn+fp)

print(f'Sensitivity: %.3f' % sensitivity)
print(f'Specificity: %.3f' % specificity)
Sensitivity: 0.584
Specificity: 0.448
In [113]:
## These are the top five diseases that comorbid with Pneumonia, according to the data:
for item in ['Atelectasis', 'Consolidation', 'Edema', 'Effusion', 'Infiltration']:
    tn, fp, fn, tp = confusion_matrix(
        val_data[val_data[item]==1].Pneumonia.values,
        val_data[val_data[item]==1].predictions.values,
        labels=[0,1]).ravel()
    
    sensitivity = tp / (tp+fn)
    specificity = tn / (tn+fp)

    print('For {}:'.format(item.upper()))
    print('--> Sensitivity: %.3f' % sensitivity)
    print('--> Specificity: %.3f\n' % specificity)
For ATELECTASIS:
--> Sensitivity: 0.592
--> Specificity: 0.408

For CONSOLIDATION:
--> Sensitivity: 0.542
--> Specificity: 0.538

For EDEMA:
--> Sensitivity: 0.570
--> Specificity: 0.267

For EFFUSION:
--> Sensitivity: 0.580
--> Specificity: 0.445

For INFILTRATION:
--> Sensitivity: 0.521
--> Specificity: 0.434

Current Limitations

For convenience we shall refer to Model 2 as the Algorithm…

  • The presence of other diseases like Atelectasis, Edema, and Effusion, played an impact on the performance of the algorithm in predicting and depicting Pneumonia cases from the X-rays. This is verified by observing the rather lower performance outputs (as shown above).
  • Such diseases when comorbid with Pneumonia may well have had an impact on the algorithms’ Sensitivity and Specificity (see the above corresponding results for verification).
  • When trying to accurately predict Pneumonia cases, while other diseases such as the top five mentioned above are present, chances are higher for the algorithm’s specificity to be impacted, thus leading to several unwanted/undesirable False Positives in the in the results.
In [145]:
## Number of Pneumonia and Non-Pneumonia cases in the data set.
pos_pneumonia = all_xray_df[all_xray_df['Pneumonia']==1]
neg_pneumonia = all_xray_df[all_xray_df['Pneumonia']==0]

print('Total Pneumonia cases in the data: '+ str(len(pos_pneumonia)))
print('Total Non-Pneumonia cases in the data: {}'.format(len(neg_pneumonia)))
print('Percentage (%) of Pneumonia cases in the data: {}%'.format(
    np.round(100*len(pos_pneumonia)/(len(pos_pneumonia)+len(neg_pneumonia)), 2)))
Total Pneumonia cases in the data: 1431
Total Non-Pneumonia cases in the data: 110689
Percentage (%) of Pneumonia cases in the data: 1.28%
In [158]:
## Pneumonia omorbitiy with other diseases (plot).
plt.figure(figsize=(16,6))
pos_pneumonia['Finding Labels'].value_counts()[0:30].plot(kind='bar', color='navy')
plt.xlabel('Diseases')
plt.ylabel('Number of Cases')
plt.legend(loc='best', fontsize='large')
plt.savefig('./out/Pneumonia_Omorbitiy_with_Other_Diseases.png', dpi=300)

References

In [159]:
## We are done with this notebook!
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)
Out[159]:
{'status': 'ok', 'restart': True}